Thesis
Datasets
UCI and Kaggle datasets
- Numerical
- Binary
- Less than 10 attributes
- 10 or more attributes
- Multiclass
- Less than 10 attributes
- Binary
- Mixed
- Binary
- Multiclass
Keel datasets
- Imbalanced
- Binary
- Imbalance ratio between 1.5 and 9
- Imbalance ratio higher than 9
- Multiclass
- Binary
- Noisy
- [cn] Class noise
- [an] Attribute noise
- [an_nn] noisy train, noisy test
- 5% noise
- 20% noise
- [an_nc] noisy train, clean test
- 5% noise
- 20% noise
- [an_cn] clean train, noisy test
- 5% noise
- 20% noise
- [an_nn] noisy train, noisy test
- [cn] Class noise
Data
Noisy data
Numerical
Binary
[1] "banknote"
[1] "haberman"
[1] "skin"
[1] "vertebral_column2"
[1] "weight_height"
[1] "audit_risk"
[1] "ionospheren"
[1] "sonar"
# A tibble: 8 x 9
name type instances features num_cat classes class_names proportion
<chr> <fct> <int> <dbl> <chr> <int> <chr> <chr>
1 bank… nume… 1372 4 [4/0] 2 [1/0] [0.44/0.5…
2 habe… nume… 306 3 [3/0] 2 [2/1] [0.26/0.7…
3 skin nume… 245057 3 [3/0] 2 [1/2] [0.21/0.7…
4 vert… nume… 310 6 [6/0] 2 [Normal/Ab… [0.32/0.6…
5 weig… nume… 10000 2 [2/0] 2 [Male/Fema… [0.5/0.5]
6 audi… nume… 776 24 [24/0] 2 [1/0] [0.39/0.6…
7 iono… nume… 351 32 [32/0] 2 [b/g] [0.36/0.6…
8 sonar nume… 208 60 [60/0] 2 [R/M] [0.47/0.5…
# … with 1 more variable: imbalance_ratio <dbl>
< 10
[1] "banknote"
[1] "haberman"
[1] "skin"
[1] "vertebral_column2"
[1] "weight_height"
# A tibble: 5 x 9
name type instances features num_cat classes class_names proportion
<chr> <fct> <int> <dbl> <chr> <int> <chr> <chr>
1 bank… nume… 1372 4 [4/0] 2 [1/0] [0.44/0.5…
2 habe… nume… 306 3 [3/0] 2 [2/1] [0.26/0.7…
3 skin nume… 245057 3 [3/0] 2 [1/2] [0.21/0.7…
4 vert… nume… 310 6 [6/0] 2 [Normal/Ab… [0.32/0.6…
5 weig… nume… 10000 2 [2/0] 2 [Male/Fema… [0.5/0.5]
# … with 1 more variable: imbalance_ratio <dbl>
Banknote authentication
Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.
- Source: UCI Machile Learning Repository
- Classification: binary
- Input features: numerical
- Number of rows: 1372
- Number of attributes: 4
Description of the attributes:
variance: variance of wavelet transformed image numericalskewness: skewness of wavelet transformed image numericalcurtosis: curtosis of wavelet transformed image numericalentropy: entropy of the image numericalclass:
Data
# A tibble: 1,372 x 5
class variance skewness curtosis entropy
<fct> <dbl> <dbl> <dbl> <dbl>
1 0 3.62 8.67 -2.81 -0.447
2 0 4.55 8.17 -2.46 -1.46
3 0 3.87 -2.64 1.92 0.106
4 0 3.46 9.52 -4.01 -3.59
5 0 0.329 -4.46 4.57 -0.989
6 0 4.37 9.67 -3.96 -3.16
7 0 3.59 3.01 0.729 0.564
8 0 2.09 -6.81 8.46 -0.602
9 0 3.20 5.76 -0.753 -0.613
10 0 1.54 9.18 -2.27 -0.735
# … with 1,362 more rows
Haberman
The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer.
- Source: UCI Machile Learning Repository
- Number of rows: 306
- Number of attributes: 3
- Classification:* binary
- Input features:* numerical
Description of the attributes:
age: Age of patient at time of operation numericalyear: Patient’s year of operation numericalnodes: Number of positive axillary nodes detected numericalclass: Survival status (class attribute)- 1 = the patient survived 5 years or longer [positive]
- 2 = the patient died within 5 year
Skin segmentation
The skin dataset is collected by randomly sampling B,G,R values from face images of various age groups (young, middle, and old), race groups (white, black, and asian), and genders obtained from FERET database and PAL database. Total learning sample size is 245057; out of which 50859 is the skin samples and 194198 is non-skin samples. Color FERET Image Database: [Web Link], PAL Face Database from Productive Aging Laboratory, The University of Texas at Dallas: [Web Link].
- Source: UCI Machile Learning Repository
- Number of rows: 245057
- Number of attributes: 3
- Classification: binary
- Input features: numerical
Description of the attributes:
red: numericalgreen: numericalblue: numericalclass:- 1: it is a skin sample [positive]
- 2: it is not a skin sample
Vertebral column 2
Biomedical data set built by Dr. Henrique da Mota during a medical residence period in the Group of Applied Research in Orthopaedics (GARO) of the Centre Médico-Chirurgical de Réadaptation des Massues, Lyon, France. The task consists in classifying patients as belonging to one out of two categories: Normal (100 patients) or Abnormal (210 patients). We provide files also for use within the WEKA environment.
Classifying patients as belonging to one out of three categories: Normal (100 patients), Disk Hernia (60 patients) or Spondylolisthesis (150 patients).
- Source: UCI Machile Learning Repository
- Classification: binary
- Input features: numerical
- Number of rows: 310
- Number of attributes: 6
Description of the attributes:
>= 10
[1] "audit_risk"
[1] "ionospheren"
[1] "sonar"
# A tibble: 3 x 9
name type instances features num_cat classes class_names proportion
<chr> <fct> <int> <dbl> <chr> <int> <chr> <chr>
1 audi… nume… 776 24 [24/0] 2 [1/0] [0.39/0.6…
2 iono… nume… 351 32 [32/0] 2 [b/g] [0.36/0.6…
3 sonar nume… 208 60 [60/0] 2 [R/M] [0.47/0.5…
# … with 1 more variable: imbalance_ratio <dbl>
Audit risk
Many risk factors are examined from various areas like past records of audit office, audit-paras, environmental conditions reports, firm reputation summary, on-going issues report, profit-value records, loss-value records, follow-up reports etc. After in-depth interview with the auditors, important risk factors are evaluated and their probability of existence is calculated from the present and past records.
The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. The information about the sectors and the counts of firms are listed respectively as Irrigation (114), Public Health (77), Buildings and Roads (82), Forest (70), Corporate (47), Animal Husbandry (95), Communication (1), Electrical (4), Land (5), Science and Technology (3), Tourism (1), Fisheries (41), Industries (37), Agriculture (200).
- Source: UCI Machile Learning Repository
- Classification: binary
- Input features: numerical
- Number of rows: 776
- Number of attributes: 24
Description of the attributes:
att1: numericalatt2: numericalatt3: numericalatt4: numericalatt5: numericalatt6: numericalatt7: categoricalclass:- Abnormal: [positive]
- Normal:
Eeg eye state
All data is from one numerical EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analysing the video frames. ‘1’ indicates the eye-closed and ‘0’ the eye-open state. All values are in chronological order with the first measured value at the top of the data.
- Source: UCI Machile Learning Repository
- Classification: binary
- Input features: numerical
- Number of rows:
- Number of attributes:
Description of the attributes:
Ionospheren
This radar data was collected by a system in Goose Bay, Labrador. This system consists of a phased array of 16 high-frequency antennas with a total transmitted power on the order of 6.4 kilowatts. See the paper for more details. The targets were free electrons in the ionosphere. “Good” radar returns are those showing evidence of some type of structure in the ionosphere. “Bad” returns are those that do not; their signals pass through the ionosphere.
Received signals were processed using an autocorrelation function whose arguments are the time of a pulse and the pulse number. There were 17 pulse numbers for the Goose Bay system. Instances in this databse are described by 2 attributes per pulse number, corresponding to the complex values returned by the function resulting from the complex electromagnetic signal.
- Source: UCI Machile Learning Repository
- Classification: binary
- Input features: numerical
- Number of rows: 351
- Number of attributes: 32
Description of the attributes:
X1-X34: numericalclass:- Bad: [positive]
- Good:
Sonar
contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file “sonar.rocks” contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock.
Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.
The label associated with each record contains the letter “R” if the object is a rock and “M” if it is a mine (metal cylinder). The numbers in the labels are in increasing order of aspect angle, but they do not encode the angle directly.
- Source: UCI Machile Learning Repository
- Classification: binary
- Input features: numerical
- Number of rows: 208
- Number of attributes: 60
Description of the attributes:
V1-V60: numericalclass:- M: [positive]
- R:
Multiclass
< 10
Ecoli
Desription of the dadtaset
- Source: UCI Machile Learning Repository
- Classification: multiclass
- Input features: numerical
- Number of rows: 336
- Number of attributes: 7
Description of the attributes:
mcg: numericalgvh: numericallip: numericalchg: numericalaac: numericalalm1: numericalalm2: categoricalclass:- cp
- im
- imS
- imL
- imU
- om
- omL
- pp
Iris
The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
This is an exceedingly simple domain.
This data differs from the data presented in Fishers article (identified by Steve Chadwick, spchadwick ‘@’ espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,“Iris-setosa” where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,“Iris-setosa” where the errors are in the second and third features.
- Source: UCI Machile Learning Repository
- Classification: multiclass
- Input features: numerical
- Number of rows: 150
- Number of attributes: 4
Description of the attributes:
Sepal.Length: numericalSepal.Width: numericalPetal.Length: numericalPetal.Width: numericalclass:- Iris-setosa
- Iris-versicolor
- Iris-virginica
Life expectancy
This dataset contains 6 columns and 223 Rows. Each row corresponds to a country in order of their life expectancy rank. The dataset has three numeric columns, Overall Life Expectancy, Male Life Expectancy and Female Life Expectancy. The last column is Continent, which defines which continent that country lies in. This could be very well used as a class for the data.
This data can be used for classification by various techniques like SVM(linear), KNN, C.45 etc. and other supervised and unsupervised techniques.
- Source: Kaggle
- Classification: multiclass
- Input features: numerical
- Number of rows: 223
- Number of attributes: 3
Description of the attributes:
overall: numericalmale: numericalfemale: numericalclass:- Europe
- Asia
- Oceania
- North
- America
- Africa
- South America
Seeds
The examined group comprised kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each, randomly selected for the experiment. High quality visualization of the internal kernel structure was detected using a soft X-ray technique. It is non-destructive and considerably cheaper than other more sophisticated imaging techniques like scanning microscopy or laser technology. The images were recorded on 13x18 cm X-ray KODAK plates. Studies were conducted using combine harvested wheat grain originating from experimental fields, explored at the Institute of Agrophysics of the Polish Academy of Sciences in Lublin.
The data set can be used for the tasks of classification and cluster analysis.
- Source: UCI Machile Learning Repository
- Classification: multiclass
- Input features: numerical
- Number of rows: 210
- Number of attributes: 7
Description of the attributes:
area: numericalperimeter: numericalcompactness: C = 4piA/P^2, numericallength_kernel: numericalwidth_kernel: numericalasymmetry_coefficient: numericallength_kernel_groove: numericalclass:- 1
- 2
- 3
Vertebral column 3
ref a vertebral column 2 aunque sí que escribir descripción aquí
Wifi localization
Collected to perform experimentation on how wifi signal strengths can be used to determine one of the indoor locations.
- Source: UCI Machile Learning Repository
- Classification: multiclass
- Input features: numerical
- Number of rows: 2000
- Number of attributes: 7
Description of the attributes:
att1: numericalatt2: numericalatt3: numericalatt4: numericalatt5: numericalatt6: numericalclass:- 1
- 2
- 3
- 4
Yeast
Desription of the dadtaset
- Source: UCI Machile Learning Repository
- Classification: multiclass
- Input features: numerical
- Number of rows: 1484
- Number of attributes: 8
El original tiene un atributo mas Sequence Name: Accession number for the SWISS-PROT database
Description of the attributes:
mcg: McGeoch’s method for signal sequence recognition. numericalgvh: von Heijne’s method for signal sequence recognition. numericalalm: Score of the ALOM membrane spanning region prediction program. numericalmit: Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins. numericalerl: Presence of “HDEL” substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute. numericalpox: Peroxisomal targeting signal in the C-terminus. numericalvac: Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins. numericalnuc: Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins. numericalclass:- CYT
- ERL
- EXC
- ME1
- ME2
- ME3
- MIT
- NUC
- POX
- VAC
>=10
Mixed
Binary
<10
acute_inflammations1
acute_inflammations2
caesarian
mini_mammographic_masses
>= 10
statlog
Multiclass
< 10
abalone
teaching_assistant
>= 10
contraceptive
Categorical
Binary
balance_scale
breast_cancer
mini_cars
somerville
mini_tic_tac_toe
Multiclass
post_operative
mini_connect4
soybean_large
zoo
Keel
Imbalanced
Imbalanced data sets are a special case of classification problem where the class distribution is not uniform among the classes. Typically, they are composed by two classes: The majority (negative) class and the minority (positive) class.
Binary
Binary: Imbalance ratio between 1.5 and 9
[1] "ecoli_0_vs_1"
[1] "iris0"
[1] "glass0"
[1] "glass1"
[1] "glass6"
[1] "haberman"
[1] "iris0"
[1] "wisconsin"
# A tibble: 8 x 9
name type instances features num_cat classes class_names proportion
<chr> <fct> <int> <dbl> <chr> <int> <chr> <chr>
1 ecol… nume… 220 7 [7/0] 2 [negative/… [0.35/0.6…
2 iris0 nume… 150 4 [4/0] 2 [positive/… [0.33/0.6…
3 glas… nume… 214 9 [9/0] 2 [positive/… [0.33/0.6…
4 glas… nume… 214 9 [9/0] 2 [positive/… [0.36/0.6…
5 glas… nume… 214 9 [9/0] 2 [positive/… [0.14/0.8…
6 habe… nume… 306 3 [3/0] 2 [2/1] [0.26/0.7…
7 iris0 nume… 150 4 [4/0] 2 [positive/… [0.33/0.6…
8 wisc… nume… 683 9 [9/0] 2 [positive/… [0.35/0.6…
# … with 1 more variable: imbalance_ratio <dbl>
ecoli_0_vs_1
class Mcg Gvh Lip Chg Aac Alm1 Alm2
1 positive 0.49 0.29 0.48 0.5 0.56 0.24 0.35
2 positive 0.07 0.40 0.48 0.5 0.54 0.35 0.44
3 positive 0.56 0.40 0.48 0.5 0.49 0.37 0.46
4 positive 0.59 0.49 0.48 0.5 0.52 0.45 0.36
5 positive 0.23 0.32 0.48 0.5 0.55 0.25 0.35
6 positive 0.67 0.39 0.48 0.5 0.36 0.38 0.46
7 positive 0.29 0.28 0.48 0.5 0.44 0.23 0.34
8 positive 0.21 0.34 0.48 0.5 0.51 0.28 0.39
9 positive 0.20 0.44 0.48 0.5 0.46 0.51 0.57
[ reached 'max' / getOption("max.print") -- omitted 211 rows ]
glass0
class RI Na Mg Al Si K Ca Ba
1 positive 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931 8.04468 0
2 positive 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205 8.52888 0
3 positive 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178 9.57260 0
4 positive 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520 0
5 positive 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132 8.27064 0
6 positive 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694 9.40044 0
7 positive 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132 8.23836 0
Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
[ reached 'max' / getOption("max.print") -- omitted 207 rows ]
glass1
class RI Na Mg Al Si K Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931 8.04468 0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205 8.52888 0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178 9.57260 0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520 0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132 8.27064 0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694 9.40044 0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132 8.23836 0
Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
[ reached 'max' / getOption("max.print") -- omitted 207 rows ]
glass6
class RI Na Mg Al Si K Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931 8.04468 0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205 8.52888 0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178 9.57260 0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520 0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132 8.27064 0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694 9.40044 0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132 8.23836 0
Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
[ reached 'max' / getOption("max.print") -- omitted 207 rows ]
haberman
# A tibble: 306 x 4
class age year nodes
<fct> <dbl> <dbl> <dbl>
1 1 30 64 1
2 1 30 62 3
3 1 30 65 0
4 1 31 59 2
5 1 31 65 4
6 1 33 58 10
7 1 33 60 0
8 2 34 59 0
9 2 34 66 9
10 1 34 58 30
# … with 296 more rows
iris0
class SepalLength SepalWidth PetalLength PetalWidth
1 positive 5.1 3.5 1.4 0.2
2 positive 4.9 3.0 1.4 0.2
3 positive 4.6 3.1 1.5 0.2
4 positive 5.0 3.6 1.4 0.2
5 positive 5.4 3.9 1.7 0.4
6 positive 4.6 3.4 1.4 0.3
7 positive 5.0 3.4 1.5 0.2
8 positive 4.4 2.9 1.4 0.2
9 positive 5.4 3.7 1.5 0.2
10 positive 4.8 3.4 1.6 0.2
11 positive 4.8 3.0 1.4 0.1
12 positive 4.3 3.0 1.1 0.1
13 positive 5.7 4.4 1.5 0.4
14 positive 5.4 3.9 1.3 0.4
15 positive 5.1 3.5 1.4 0.3
[ reached 'max' / getOption("max.print") -- omitted 135 rows ]
winsconsin
class ClumpThickness CellSize CellShape MarginalAdhesion
1 negative 6 1 1 1
2 negative 6 5 5 6
3 negative 4 1 1 1
4 negative 7 9 9 1
5 negative 5 1 1 4
6 positive 9 2 2 9
7 negative 1 1 1 1
EpithelialSize BareNuclei BlandChromatin NormalNucleoli Mitoses
1 3 1 4 1 1
2 8 2 4 3 1
3 3 3 4 1 1
4 4 5 4 8 1
5 3 1 4 1 1
6 8 2 10 8 1
7 3 2 4 1 1
[ reached 'max' / getOption("max.print") -- omitted 676 rows ]
Binary: Imbalance ratio higher than 9
[1] "ecoli_0_vs_1"
[1] "iris0"
[1] "glass0"
[1] "glass1"
[1] "glass6"
[1] "haberman"
[1] "iris0"
[1] "wisconsin"
# A tibble: 8 x 9
name type instances features num_cat classes class_names proportion
<chr> <fct> <int> <dbl> <chr> <int> <chr> <chr>
1 ecol… nume… 220 7 [7/0] 2 [negative/… [0.35/0.6…
2 iris0 nume… 150 4 [4/0] 2 [positive/… [0.33/0.6…
3 glas… nume… 214 9 [9/0] 2 [positive/… [0.33/0.6…
4 glas… nume… 214 9 [9/0] 2 [positive/… [0.36/0.6…
5 glas… nume… 214 9 [9/0] 2 [positive/… [0.14/0.8…
6 habe… nume… 306 3 [3/0] 2 [2/1] [0.26/0.7…
7 iris0 nume… 150 4 [4/0] 2 [positive/… [0.33/0.6…
8 wisc… nume… 683 9 [9/0] 2 [positive/… [0.35/0.6…
# … with 1 more variable: imbalance_ratio <dbl>
ecoli4
class Mcg Gvh Lip Chg Aac Alm1 Alm2
1 negative 0.49 0.29 0.48 0.5 0.56 0.24 0.35
2 negative 0.07 0.40 0.48 0.5 0.54 0.35 0.44
3 negative 0.56 0.40 0.48 0.5 0.49 0.37 0.46
4 negative 0.59 0.49 0.48 0.5 0.52 0.45 0.36
5 negative 0.23 0.32 0.48 0.5 0.55 0.25 0.35
6 negative 0.67 0.39 0.48 0.5 0.36 0.38 0.46
7 negative 0.29 0.28 0.48 0.5 0.44 0.23 0.34
8 negative 0.21 0.34 0.48 0.5 0.51 0.28 0.39
9 negative 0.20 0.44 0.48 0.5 0.46 0.51 0.57
[ reached 'max' / getOption("max.print") -- omitted 327 rows ]
ecoli_0_1_4_6_vs_5
class a1 a2 a3 a5 a6 a7
1 negative 49 29 48 56 24 35
2 negative 7 4 48 54 35 44
3 negative 56 4 48 49 37 46
4 negative 59 49 48 52 45 36
5 negative 23 32 48 55 25 35
6 negative 67 39 48 36 38 46
7 negative 29 28 48 44 23 34
8 negative 21 34 48 51 28 39
9 negative 2 44 48 46 51 57
10 negative 42 4 48 56 18 3
[ reached 'max' / getOption("max.print") -- omitted 270 rows ]
ecoli_0_1_4_7_vs_2_3_5_6
class a1 a2 a3 a4 a5 a6 a7
1 negative 49 29 48 5 56 24 35
2 negative 7 4 48 5 54 35 44
3 negative 56 4 48 5 49 37 46
4 negative 59 49 48 5 52 45 36
5 negative 23 32 48 5 55 25 35
6 negative 67 39 48 5 36 38 46
7 negative 29 28 48 5 44 23 34
8 negative 21 34 48 5 51 28 39
9 negative 2 44 48 5 46 51 57
[ reached 'max' / getOption("max.print") -- omitted 327 rows ]
ecoli_0_1_4_7_vs_5_6
class a1 a2 a3 a5 a6 a7
1 negative 49 29 48 56 24 35
2 negative 7 4 48 54 35 44
3 negative 56 4 48 49 37 46
4 negative 59 49 48 52 45 36
5 negative 23 32 48 55 25 35
6 negative 67 39 48 36 38 46
7 negative 29 28 48 44 23 34
8 negative 21 34 48 51 28 39
9 negative 2 44 48 46 51 57
10 negative 42 4 48 56 18 3
[ reached 'max' / getOption("max.print") -- omitted 322 rows ]
ecoli_0_6_7_vs_5
class a1 a2 a3 a5 a6 a7
1 negative 49 29 48 56 24 35
2 negative 7 4 48 54 35 44
3 negative 56 4 48 49 37 46
4 negative 59 49 48 52 45 36
5 negative 23 32 48 55 25 35
6 negative 67 39 48 36 38 46
7 negative 29 28 48 44 23 34
8 negative 21 34 48 51 28 39
9 negative 2 44 48 46 51 57
10 negative 42 4 48 56 18 3
[ reached 'max' / getOption("max.print") -- omitted 210 rows ]
glass2
class RI Na Mg Al Si K Ca Ba Fe
1 negative 1.51673 13.30 3.64 1.53 72.53 0.65 8.03 0.00 0.29
2 negative 1.51750 12.82 3.55 1.49 72.75 0.54 8.52 0.00 0.19
3 negative 1.51775 12.85 3.48 1.23 72.97 0.61 8.56 0.09 0.22
4 negative 1.51646 13.41 3.55 1.25 72.81 0.68 8.10 0.00 0.00
5 negative 1.51761 12.81 3.54 1.23 73.24 0.58 8.39 0.00 0.00
6 negative 1.51846 13.41 3.89 1.33 72.38 0.51 8.28 0.00 0.00
7 negative 1.51811 13.33 3.85 1.25 72.78 0.52 8.12 0.00 0.00
[ reached 'max' / getOption("max.print") -- omitted 207 rows ]
glass4
class RI Na Mg Al Si K Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931 8.04468 0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205 8.52888 0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178 9.57260 0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520 0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132 8.27064 0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694 9.40044 0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132 8.23836 0
Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
[ reached 'max' / getOption("max.print") -- omitted 207 rows ]
glass5
class RI Na Mg Al Si K Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931 8.04468 0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205 8.52888 0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178 9.57260 0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520 0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132 8.27064 0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694 9.40044 0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132 8.23836 0
Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
[ reached 'max' / getOption("max.print") -- omitted 207 rows ]
Multiclass
Noisy
Class noise
Attribute noise
Desription of the dadtaset
- Source: UCI Machile Learning Repository
- Classification: multiclass
- Input features: numerical
- Number of rows:
- Number of attributes:
Description of the attributes: